python基础第四节 -- 字符串处理

4.1 字符串

字符串是 Python 中最常用的数据类型. 在python中，引号之间的字符，都是字符串.
创建字符串的方式:
1.单引号。
2.双引号。
3.三引号: 之前在注释的章节，在python中三引号可以作为多行注释来用，但是三引号内的本质上还是字符串。
如果使用变量去接受这个三引号的值，那么这个变量的类型就是str。

4.2 字符串的编码

在python3中，字符串是使用的Unicode。支持多语言，包括中文，类型是str。

也可以认为是utf-8。utf-8是基于unicode编码的一种节约字节的编码。

为了节省大小，就出现了utf-8，使用1到4个字节进行存储，常用的英文字母被编码成1个字节，汉字通常是3个字节，只有很生僻的字符才会被编码成4个字节。
在python中，常用到的编码和解码的方法是:
编码: encode
解码: decode
参考文章: https://www.zhihu.com/question/23374078/answer/69732605

数据存在内存里是使用的unicode。但是如果要进行网络传输的话，是需要将unicode转换成bytes的。
在python中bytes类型的数据表示法是: b"xxx"
使用encode和decode可以将字符串在utf8和bytes之间来回转换.
其中encode和decode是可以传参进去的, 它默认就是要编码或解码utf-8, 可以不用传参


print("案例1:", 'Python'.encode('ascii')) # 转换为了bytes
print("案例2:", '猿人学python'.encode('utf-8')) # 转换为了bytes类型,注意:中文不能用ascii编码,会报错
print("案例3:", b'\xe7\x8c\xbf\xe4\xba\xba\xe5\xad\xa6python'.decode())
print("案例4:", b'Python'.decode())

print(type('Python'.encode('ascii'))) # 转为bytes类型 注意:中文不能用ascii编码,会报错
print(type('猿人学python'.encode('utf-8'))) # 转为bytes类型

案例1: b'Python'
案例2: b'\xe7\x8c\xbf\xe4\xba\xba\xe5\xad\xa6python'
案例3: 猿人学python
案例4: Python

4.3 转义符

\ 转义符 & 续行符(在行尾时是续行符)
\r 回车
\n 换行
\t 制表符

1. 转义符 & 续行符 `\`

续行作用

libai = "日照香炉生紫烟, 遥看瀑布挂前川。\
飞流直下三千尺, 疑是银河落九天。"

print(libai)

a = 1
b = a \
+ 1

print(b)

日照香炉生紫烟, 遥看瀑布挂前川。飞流直下三千尺, 疑是银河落九天。
2

转义作用 -该符号后的东西不具备功能作用,只代表本身

# lb = "作者:"李白" 子:"太白" 号:"青莲居士"" --报错

lb = "作者:\"李白\" 子:\"太白\" 号:\"青莲居士\""
print(lb)
# 另一个方法:让最外面这层变成单引号,外面用双引号,里面用单引号也是可以的,不冲突就行
lb2 = '作者:"李白" 子:"太白" 号:"青莲居士"'
print(lb2)
lb3 = '''作者:"李白" 子:"太白" 号:"青莲居士"'''
print(lb3)

作者:"李白" 子:"太白" 号:"青莲居士"
作者:"李白" 子:"太白" 号:"青莲居士"
作者:"李白" 子:"太白" 号:"青莲居士"

2. 回车符`\r`

作用:光标重置回顶格
`


libai2 = "日照香炉生紫烟, 遥看瀑布挂前川。\r飞流直下三千尺, 疑是银河落九天。"
print(libai2)

飞流直下三千尺, 疑是银河落九天。

3. 换行符`\n`


libai3 = "日照香炉生紫烟, 遥看瀑布挂前川。\n飞流直下三千尺, 疑是银河落九天。"
print(libai3)

飞流直下三千尺, 疑是银河落九天。
日照香炉生紫烟, 遥看瀑布挂前川。

4、制表符`\t`


print("学号 姓名 语文 数学 英语")
print('-'*20)
print("01 张三 99 88 0")
print("01 李四 92 45 93")
print("08 王五 77 82 100")

print("学号\t姓名\t语文\t数学\t英语")
print('-'*36)
print("01\t张三\t99\t88\t0")
print("01\t李四\t92\t45\t93")
print("08\t王五\t77\t82\t00")

image.png|250

4.4 原生字符 `r''`

原生字符 raw strings
在python中, 以r修饰符引领的字符串(r''), 表示字符串中的内容会被原样输出.

path = 'D:\network\work'
print(path)

pathr = r'D:\network\work'
print(pathr)

D:
etwork\work
D:\network\work

4.5 字符串的拼接和格式化

可以使用+号来实现字符串的拼接。（但是在字符串的拼接中, 如果变量过多, 拼接起来会十分的麻烦）

name = "张三"
age = 20
gender = '男'
hobby = "打篮球,打游戏"

print("我的名字叫:" + name + "我的年龄是:" + str(age) + "我的性别是:" + gender + " 我的爱好是:" + hobby)

我的名字叫:张三我的年龄是:20我的性别是:男我的爱好是:打篮球,打游戏

但是在字符串的拼接中, 如果变量过多, 拼接起来会十分的麻烦，而且字符串和数字或者其他类型无法完成拼接。除了+号外，还有三种便捷方法（格式化）：

字符串的格式化也是字符串拼接的一种，只是语法格式化有它自己的一套语法，python中常用的字符串的格式化的方式有多种。

1. `%` 站位

使用%来占位，非字符串类型也能使用%来占位。
完整语法%s，s的意思是将变量变为字符串放入占位的地方. 可以接收整型，或者浮点型。
使用%占位符, 如果接受的变量类型是整型, 或者浮点型, 你想要它保持它原本的样子, 还可以使用%d, %f
数字类型的精度控制：在格式化输出中, 使用m.n(m不常用)的形式来控制数字类型的精度。
需要注意的是：如果m比数字本身的宽度还小, 则m不会生效。
.n对小数部分做精度控制的同时会对小数部分做四舍五入。

text1 = '我的名字叫%s。' % name
text2 = '我的名字叫%s,今天%s岁。' % (name,age)

score = 88.9
text3 = "我的考试分数是%s" % score # 格式化一个对象为字符串 
text4 = "我的考试分数是%d" % score # 格式化一个对象为十进制整数
text5 = "我的考试分数是%f" % score # 格式化一个对象为浮点数

text6 = "我的考试分数是%.2f" % score
text7 = "我的考试分数是%7.2f" % score # 长度不够的部分由空格凑
text8 = "我的考试分数是%4.2f" % score # m比数字本身的宽度还小, 则m不会生效

print(text1)
print(text2)
print(text3)
print(text4)
print(text5)
print(text6)
print(text7)
print(text8)

我的名字叫张三。
我的名字叫张三,今天20岁。
我的考试分数是88.9
我的考试分数是88
我的考试分数是88.900000
我的考试分数是88.90
我的考试分数是 88.90
我的考试分数是88.90

2. format占位

使用大括号为占位符. 语法: str.format()

(此处视频课程中讲的在{}中写入name, 如{name}实际上是传参的意思, 大家还没学到, 如不理解, 也不必在意, 后续会学到.)

也是支持运算的。


txt1 = '我的名字叫{},今天{}岁。'.format(name,age)
txt2 = '我的名字叫{name},今天{age}岁。'.format(name = '李四',age = 15) # 实际name和age的变量不会被改变


print(txt1)
print(txt2)
print(name)
print(age)

# 也是支持运算的
test1 = '我的名字叫{},今天{}岁,马上就{}岁了。'.format(name,age,age + 1)

我的名字叫张三,今天20岁。
我的名字叫李四,今天15岁。
张三
20

我的名字叫张三,今天20岁, 马上就21岁了。

3. f-string

f-string: 在python3.6加入, f-string 在形式上是以 f 修饰符引领的字符串(f''),

字符串中的 {} 表明将要被替换的字段。

f-string 在本质上并不是字符串常量, 而是一个在运行时运算求值的表达式.


tft1 = f'我的名字叫{name},今天{age}岁'
tft2 = f'我的名字叫{name},今天{age}岁, 过完生日,马上我就{age + 1}岁了'

print(tft1)
print(tft2)



# 注意此处不行,因为调用的还是前面定义的age, age = 15 并未生效
test2 = '我的名字叫{name},今天{age}岁,马上就{new_age}岁了。'.format(name = '李四',age = 15, new_age = age+ 1) 


print(test1)
print(test2)

我的名字叫张三,今天20岁
我的名字叫张三,今天20岁, 过完生日,马上我就21岁了

我的名字叫李四,今天15岁,马上就21岁了。

4.6 字符串的常用方法

1. 常用函数

字符串.upper()：转换字符串中的小写字母为大写
字符串.lower()：转换字符串中所有大写字符为小写
len(字符串)：可以获取到字符串的长度
字符串.count(子字符串)：可以统计某一个子字符串出现的次数
字符串.index(子字符串)：可以返回子字符串出现的左边第一个位置
字符串.find(子字符串)：可以返回子字符串出现的左边第一个位置
注意: 想要返回右边第一个,则用rindex,rfind,它们是从右边开始找.
注意: 若找不到子字符串,index/rindex则会报错,find/rfind则会返回-1
注意: 字符串的索引值是从0开始计算的.
字符串.strip()：去除字符串首尾的空字符和转义符，其中可以传入参数，去掉想去掉的首尾子字符
split(str, num)：切分字符串, 以 str 为分隔符切片 string, 如果 num 有指定值, 则仅分隔 num+1 个子字符串
replace(str1, str2)：替换字符串. 将字符串中的str1替换为str2.

# upper & lower
hello_string = 'hello world'

print(hello_string.upper())
print(hello_string.upper().lower())

HELLO WORLD
hello world

# len & count
print(len(hello_string)) # 包括标点和空格
print(hello_string.count('wo'))

11
1

# index & find
print(hello_string.index('o')) # 字符串的索引值是从0开始计算的.
print(hello_string.find('o')) # 字符串的索引值是从0开始计算的

# 想要返回右边第一个,则用rindex,rfind,它们是从右边开始找
print(hello_string.rindex('o')) # 字符串的索引值是从0开始计算的.
print(hello_string.rfind('o')) # 字符串的索引值是从0开始计算的

4
4
7
7

# strip
hello_string = '\n hello world \n'
print(hello_string.strip())  # 实际变量没有改变

hello_string = 'hello world'
print(hello_string.strip('hlde'))  # 去首尾找有没有h l d e,有就去掉
print(hello_string)

hello world
o wor
hello world

# split
hello_string = 'hello world'
print(hello_string.split('o')) # 返回切分后的列表 ['hell', ' w', 'rld']
hello_string = 'hello world hello world'
print(hello_string.split('o',2)) # 返回切分后的列表 ['hell', ' w', 'rld hello world'],仅前两个o左右切

['hell', ' w', 'rld']
['hell', ' w', 'rld hello world']


# replace
hello_string = 'hello world'
print(hello_string.replace('world','世界'))

hello 世界

# 获得某类型变量能用的所有方法
print(dir('hello_world'))

['add', 'class', 'contains', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'getitem', 'getnewargs', 'getstate', 'gt', 'hash', 'init', 'init_subclass', 'iter', 'le', 'len', 'lt', 'mod', 'mul', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'rmod', 'rmul', 'setattr', 'sizeof', 'str', 'subclasshook', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

2. 类型判断

字符串.isspace() 如果string中只包含空格, 则返回True 否则返回 False
字符串.isalnum() 如果string中至少有一个字符且所有字符都是字母或者数字, 则返回True 否则返回 False
字符串.isalpha() 如果 string 至少有一个字符并且所有字符都是字母则返回 True,否则返回 False
字符串.isdecimal() 如果 string 只包含十进制数字则返回 True 否则返回 False.
字符串.isdigit() 如果 string 只包含数字则返回 True 否则返回 False.
字符串.isnumeric() 如果 string 中只包含数字字符, 则返回 True, 否则返回 False
字符串.istitle() 如果 string 是标题化的(所有单词都是以大写开始, 其余字母均为小写)则返回 True, 否则返回 False
字符串.islower() 如果 string 中包含至少一个区分大小写的字符, 并且所有这些(区分大小写的)字符都是小写, 则返回 True, 否则返回 False
字符串.isupper() 如果 string 中包含至少一个区分大小写的字符, 并且所有这些(区分大小写的)字符都是大写, 则返回 True, 否则返回 False


str0 = 'hello word'
str1 = '  '
str2 = 'helloword2'
str3 = '12393²' 
str4 = '一万'
print(str1.isspace())
print(str0.isalnum()) # 不仅有字母和数字,还有空格,则返回false
print(str2.isalpha()) # 不仅有字母,还有数字,所有返回false
print(str3.isdecimal()) # 是否都是0-9数字
print(str3.isdigit()) # 平方符号不属于十进制,但是是数字
print(str4.isnumeric()) # 中文一万,也属于数字字符,则返回True

titile = 'Space Joking'
print(titile.istitle())
print(titile.islower())
print(titile.isupper())

True
False
False
False
True
True
True
False
False

3. 查找替换

字符串.startswith(x, beg, end) 检查字符串是否以x开头, 如果beg 或者 end 指定则检查指定的范围内是否以 obj 开头, 如果是, 返回 True,否则返回 False.
字符串.endswith(x, beg, end) 检查字符串是否以x结束, 如果beg 或者 end 指定则检查指定的范围内是否以 obj 结束, 如果是, 返回 True,否则返回 False.
字符串.find(str, beg, end) 检测 str 是否包含在 string 中, 如果 beg 和 end 指定范围, 则检查是否包含在指定范围内, 如果是返回开始的索引值, 否则返回-1
字符串.index() 跟find()方法一样, 只不过如果str不在 string中会报一个异常.
字符串.reaplce(str1, str2) 替换字符串. 将字符串中的str1替换为str2.

# 查找和替换


string = 'hello world'
# 包左不包右, 且此处不可用-1表示倒数

print(string.startswith('h',1,11))
print(string.startswith('h',0,11))

print(string.endswith('d',0,11))
print(string.endswith('d',0,10))

print(string.find('d',0,11))
print(string.index('d',0,11))

False
True
True
False
10
10

4.7 字符串的下标和切片

下标: 也可以理解为索引. 如下列字符串:

name = 'abcdef'
# 注意: 字符串的索引值是从0开始计算的.
print(name[0])
print(name[1])
print(name[2])
print(name[3])
print(name[4])
print(name[5])
print(name[-1])

切片:

切片是指对操作的对象截取其中一部分的操作。字符串、列表、元组都支持切片操作.

切片的语法: [起始:结束:步长], 步长不写, 默认为1

注意: 选取的区间属于左闭右开型, 即从"起始"位开始, 到"结束"位的前一位结束(不包含结束位本身)

即包左不包右。

本节课的切片均是以字符串的切片为案例讲解。

print('-'*20)
name = 'abcdef'
print(name[0:3])  # 取 下标0~2 的字符
print(name[:5])  # 取 下标为0~4 的字符
print(name[3:5])  # 取 下标为3、4 的字符
print(name[2:])  # 取 下标为2开始到最后的字符
print(name[2:-1])  # 取 下标为2开始 到 最后第2个之间的字符
print(name[0:]) # 取所有的
print(name[:]) # 取所有的




print('-'*20)
name = 'abcdef'
print(name[0:3:2])  # 以2为步长取每一步的第一个
print(name[0:5:2])  # 以2为步长取每一步的第一个



print('-'*20)
# 若开始位置比结束位置要大,默认步长是1,步长正值是往后切, 5 + 1 没有字母了, 则切不到东西,返回一个空的str
print(name[5:1])

# 若开始位置比结束位置大, 使用负值步长往前切, 则按步长反向从后往前取字符
print(name[5:1:-1])
print(name[5:1:-2])


print('-'*20)
############# 如何利用切片将一个字符串倒序呢?###############
name = 'abcdef'
print-1]
print-1]

a
b
c
d
e
f
f
--------------------
abc
abcde
de
cdef
cde
abcdef
abcdef
--------------------
ac
ace
--------------------

fedc
fd
--------------------
fedcba
fedcba

练习

1. 将字符串 "abcd" 转成大写
2. 计算字符串 "cd" 在 字符串 "abcd"中出现的位置
3. 字符串 "a,b,c,d" ，请用逗号分割字符串，分割后的结果是什么类型的?
4. "{}喜欢{}".format("张三") 执行会出错，请修改代码让其正确执行
5. string = "Python is good", 请将字符串里的Python替换成python,并输出替换后的结果
6. 有一个字符串 string="python字符串学习.py"，请写程序从这个字符串里获得.py前面的部分
7. "this is python",请将字符串里的python替换成apple
8. "this is python", 请用程序判断该字符串是否以this开头
9. "this is python", 将此字符串的每一个单词首字母大写
10. "this is a book\n"， 字符串的末尾有一个换行符，请将其删除
11. "\nthis is a book\n"， 字符串的首尾有一个换行符，请将其开头的换行符删除

# 1. 将字符串 "abcd" 转成大写

print("abcd".upper())

# 2. 计算字符串 "cd" 在 字符串 "abcd"中出现的位置
print("abcd".find('cd'))

# 3. 字符串 "a,b,c,d" , 请用逗号分割字符串, 分割后的结果是什么类型的?
print(type("a,b,c,d".split(',')))

# 4. "{}喜欢{}".format("张三") 执行会出错, 请修改代码让其正确执行
print("{}喜欢{}".format("张三","张三"))

# 5. string = "Python is good", 请将字符串里的Python替换成python,并输出替换后的结果
string = "Python is good"
print(string.replace('P','p'))

# 6. 有一个字符串 string="python字符串学习.py", 请写程序从这个字符串里获得.py前面的部分

string="python字符串学习.py"
print(string[0:-3])
print(string[0:-len('.py')])
print(string.split('.')[0])

a_list= string.split('.')

end = len(a_list[-1]) + 1
print(string[:-end])


# 7. "this is python",请将字符串里的python替换成apple

print("this is python".replace('python','apple'))

# 8. "this is python", 请用程序判断该字符串是否以this开头


print("this is python".startswith('this'))

# 9. "this is python", 将此字符串的每一个单词首字母大写

print("this is python".title())


# 10. "this is a book\n",  字符串的末尾有一个换行符, 请将其删除

print("this is a book\n".strip())

# 11. "\nthis is a book\n",  字符串的首尾有一个换行符, 请将其开头的换行符删除

print("\nthis is a book\n"[2:])
print("-")

print(dir('aaa')) # 找可以去除左边换行符的功能 'lstrip'
print("\nthis is a book\n".lstrip())
print("-")

ABCD
2
<class 'list'>
张三喜欢张三
python is good
python字符串学习
python字符串学习
python字符串学习
python字符串学习
this is apple
True
This Is Python
this is a book
his is a book

-
['add', 'class', 'contains', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'getitem', 'getnewargs', 'getstate', 'gt', 'hash', 'init', 'init_subclass', 'iter', 'le', 'len', 'lt', 'mod', 'mul', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'rmod', 'rmul', 'setattr', 'sizeof', 'str', 'subclasshook', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
this is a book

Relative Article

数据类型、处理和操作

2-Python数据类型
 3-类型转换与运算符
 5-input操作
 6-布尔类型比较运算